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ABSTRACT 



The power consumed within an integrated circuit (IC) is 
reduced without substantial impact on its performance for 
typical applications by throttling the performance of par- 
ticular functional units within the IC Artificial worst-case 
powo- consumption is reduced by throttling down the activ- 
ity levels of long-duration sequences of high-power opera- 
tions. The recent utilization levels of particular functional 
units within an IC arc monitored — for ex.anq)le» by comput- 
ing each functional unit's average duty cyde over its recent 
operating history. If tfiis activity level is greater than a 
threshold, then the functional unit is operated in a reduced- 
power mode. The threshold value is set large enough to 
allow short bursts of high utilization to occur without 
impactiiig pcrfonnancc. The invention allows an integrated 
circuit to dynamically make fee tradeoff between high-speed 
operation and low-power operation, by throttling back per- 
formance of localized functional units when feeir utilization 
exceeds a sustainable leveL Additionally, this dynamic 
powa/speed tradeoff can be optimized across multiple func- 
tional units within an IC or among multiple ICs within a 
system. Additionally, this dynanoic power/speed tradeoff can 
be altered by providing software control over throttling 
parameters. 

32 Claims, 5 Drawing Sheets 
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PERFORMANCE THROTTLING TO REDUCE It is desirable to reduce the power consumed by an IC by 

IC POWER CONSUMPTION reducing eliminating node transitions in func&onaJ units 

witiiin the IC that arc not being used during a particular 

FIELD OF THE INVENTIQN sequence of operations. If an IC shuts down functional units 

5 when they are not being used, then typical power consump- 

Thc invention relates genffaily to reducing the power ^^^^ ^ ^ significantly reduced with little or no impact on 

consun9>tion of Integrated Circuits (ICs), and particularly of performance. 

Vciy Large Scale Integration (VLSI) ICs. In particj^, it However, shutting down fiinctional units is likdy to have 

relates to methods and apparatus for Arottlii^ the pcrfor- ^ wcrst-casc powa consumption, which often 

mancc of particular functional units within an IC as needed ^^^^ pgrfOTming sequences of operations 

to control worst^e power consun^tion. ^ ^ functional units within the IC. 

Ti AOcnROUND OF THE INVENTION WorstHcasc power consumption is likely to be substantially 

B ACKCHIOUND OF THE INVlif>ri lUM ^ consumption. 

Reducing the power consumed by an IC has significant Often particular functional units or logic blocks within an 

advantages: (1) Less power must be supplied to the IC; and 13 jc can be identified that tend to consume a disproportioiiatc 

(2) Less heat must be dissipated by the IC and the devices share of the IC's power— for example, the circuitry in a 

surrounding it Reducing power consumption is especially microprooessM- that performs floating-point arldimetic. The 

important when an IC is going to be used in a pwtable power consumed by a microprocessor is significantly less if 

conqjuting device, such as a hand-held or notebo<&-size it is not called on to perform many floating-point operations, 

digital device. ^ The wont-case power consumption of a microprocessor 

Portable devices often operate for extended periods of might involve a sequence of floating point operations that 

time using only the power supplied by an internal battery. operates on data values chosen to maximize node transitions 

Because Ae si2e» weight and storage capacity of a portable from one to zero and visa versa, and that executes repeatedly 

battery is very limited, conserving power is critical id using cache memory within the microprocessor so as to 

pcHtable devices. The less power its ICs consume, the longer ^ avoid reading or writing main memory. Additionally, if the 

time the portable device can operate without changing or microprocessor performs speculative evaluations of upcom- 

recharging its batteries. ing operations based on fo^cdicting which way a branch 

Further, portable devices generaUy must dissipate the heat operation will go, power consumption would be increased 

that their c<mq)onents generate wi&out the assistance of the by increasing the percentage of Iranch operations for which 

mechanical heat sinks or radiators and cooling fans that can the microi»ocessOT*s prediction is accurate. This is because 

easily be used in a desk-top or rack-mount computer system. an inaccurate i^ediction flushes the instruction-execution 

When Ac ICS within a potable device consume less power, pipeline, thus leaving some functional umts idle as the 

it operates at a lower tenq)erature. Elevated tcmpCTaturcs pipeline refills. 

within a computing device can make its components operate The designer of the system in which the IC is to be used 

unreliably or have shorter lifetimes. must know what the maximum power consumed by the IC 

The power consumed by an IC can be reduced by low- will be for any possible sequence of operations. In ordw to 

<ring the speed at which it operates. For an IC fateicated make a system that inowporates an IC robust, the IC s 

using CMOS tedmology, which dominates the manufacture maximum worst^e power must be known and specked, 

of ccMnmerdal ICs, the power flie IC consumes is directly ^ Reducing ttie wcwst-case power consu^tion erf an IC is 

OToportional to both its dock rate and its operating voltage. veiy important for rdiahility purposes, for heat dissipation 

If dther clock rate or voltage is reduced, then the power purposes and for power-supply cspaaty purposes. Thus, 

consumed is reduced. Reducing the voltage also requires there is a need to reduce die WMstncase power consumed by 

lowering the dock rate, unless an offsetting improvement is an IC with Uttlc or no reduction m pexfcHmance. 

made in the manufacturing technology. 45 A worst-case sequence of operations, as described above. 

Because typicaUy a fixed numbo: of dodc cycles is is important for estimating WOTSt-case power consumption, 

required to perform a particular operation, appioadies to whidi is essential f^ the abov^ntioned purposes. But 

redudnglCpowerconsumpdonthalreducctheclockratccrf such a sequence can be considered artifiaal lc. it may not 

the IC imfortunatdy also reduce paformancc. Thus, there is be encountered in practical ^cations of a microi^ooessor. 

a need to reduce the power consumed by an IC widiout 50 For cxan^e, it is artifldal to use a worst case power 

redudng its performance. sequence based on lots of floatmg pomt computations m 

For many complex ICs, die power consumed varies ^ miayocessor to be used in a portable computmg 

rw ™f y /'""**'™ ^ . ,f™.^r^^ device where floating pomt aerations are infrequenUy used, 

widdy with the tasktha^ diey are peifonmng. If more of die r"™™"^ umJdnX in^cal applications of portable 

dicuit nodes widiin fee IC transition from one to zero or visa « ™y ^ impOTwni m lypicai ajyu^iuui^ ui pui wji^ 

TO then more power is consumed. Thus in order to 55 computing devices tha^ long sequences of floaUng-pomt 

die typicd?ower consumption o^ andm^eUc be performed at maxmuun speed. 

iTnecessary to define a bcndmiark sequence of operations If &e performance of typical operations is mamtamcd^ 

that constitutes its typical usage. Sudi a bendunaric would then h may be accq)tablc to throttle back theperfornwnce of 

likely include substantial amounts of idle time, because less typical or artificial sequences of operations for the sake 

computing devices designed fw interactive use spend a large 60 of reducing power. Thus, dicre is a need to redu^ the 

pocentage of time wailing for user input Once such a worst-case power consumed by an IC without rcducmg 

benchmark suite of typical operations is defined, then the perfoimanoc for normal appbcations. 

power consumed by an IC in performing those operations SUMMARY OF THE INVENTION 
can be measured or estimated Such a typical power con- 

surrmtion value would be useful, for example, in estimating 63 A novel method and apparatus for controlling power 

the battery life of a portable computing device under normal consumption widdn an IC reduces worst-case power coo- 

sumption without substantially lowering performance for 
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typical applications. Worst-case power consumptfon is FIG. 3 shows the blodcs of logic circuitry in ao embodi- 

reduced by throttling down the activity levels of long- ment of the inveotioa that enforces a programmable maxi- 

duratioD sequences of hig^-power operations. mum sustainable duty cycle on a cache memory. 

Within any IC, a number of particular functional units can RG. 4 shows the bloclcs of logic drcuitiy in an emlx>di- 

consume inordinate amounts of power. For example. ^ ment of the invention that disables instniction cache 

floating-poiat arithmetic units and cache memories are two prefetching based on the recent utilization level of the data 

types of functional units within a microprocessor IC that can cache. 

consume substantial amounts of power. The invention HG. 5 shows the blocks of logic circuitry in an cmbodi- 

allows IC desigpcn to identify any number of sudi high- ment of ttie invcodon where a power coordinator reads the 

power functional units within the IC they are designing, and lO activity levels of various functional units within an IC and 

place each under the control of its own power controller. alters, based on those activity levels, the throttling param- 

Further, the invention allows IC designers to place the IC ctcrs of other functional units to dynamicaUy optimize the 

they arc designing as a whole under the control of an overall power/speed tradeoff, 
power controller. In the case of a micropfx>cessor IC, the 

power consumption as a whole can effectively be throttled 15 DETAILED DESCRIPTION OF THE 

by lowering either the instruction retirement rate or the INVEhTTIOM 
instruction issue rate. 

In one embodiment the power controller conq>riscs an Overview 

activity monitor and a mode controller. The activity monitor The invention allows an IC to dynamically make the 

tracks the recent utilization level of a particular functional ^ tradeoff t>etween high-speed operation and low-power 

unit within the IC — ^for example, by computing its average operation, by throttling back performance of a functional 

duty cycle over its recent operating histwy. If this activity unit when its recent utilization exceeds a sustainable level 

level is greater than a threshold, then the mode controller Thus, the invention allows the IC to dynamically throttle 

switches the functional unit to operate in a reduced-power back the execution rate of maTimnfn worst-case power 

mode. The threshold value is set large enough to allow short ^ consumption sequences of operatioas so as to not exceed tfic 

bursts of high utilization to occur widiout impacting peifor- worst-case power consumptioo allowable, thus avoiding 

mance. reliability, heat dissipation or power supply problems. 

Embodiments of the invention exist that add only minimal At tiie same time, the invention minimizes any perfor- 
ce st and con^lexity to the IC's design — for exanqile. one mance intact that such throttling has on realistic sequences 
up-down counter and some control circuitry per each func- of operations. This power reduction is done in a way that 
tional unit being controlled. On the other hand, the invention does not have a substantial affect on the performance of the 
is flexible in that it encompasses a wide variety of techniques IC for typical tasks. The localized control and the threshold 
for monitoring utilization of different functional units, for value that the invention provides minimize performance 
reducing die power they consume and for setting their impacts. Fiirthcr, the performance impact is jM^cdictable and 
tiirottling parameters. repeatable for those sequences of operations that the inven- 

In accordance widi another aspect of the invention, the tion does throttle, 

dynamic power/speed tradeoff <rf the invention can be opti- The purpose of an IC is not to run some artificial, 

mized across multifrie functional units within an IC or non-realistic maxin^"m worst-case power consumption 

among or among multiple ICs within a systena. The inven- ^ sequcricc of operations at high performance. Rather, it is to 

tion includes optimization schemes wherein the maximum run realistic or typical sequences of operations at high 

power consimied by a particular functional unit can be performance. In some cases, there can be a substantial 

increased or decreased depending on the power being con- difference in power consumption between such the typical 

sumed elsewhere within the same IC, or on other ICs within worse case power consumption and the artificial worst-case 

the same system. power consun^tion. The effectiveness of adding the inven- 

In accordance widi another aspect of the invention, the tion to a particular IC design depends on the amount of 

dynamic power/speed tradeoff of the invention can be con- difference between that design's artificial worst-case power 

trolled by software, such as platform software executing at consumption and its typical worst-case power consumption, 

systemboottime, or operating system software, or possibly a preferred way to look at typical worst-case power 

even plications software. 5q consumption is to look at realistic sequences of operations 

BRIEF DESCRIPTION OF THE DRAWINGS typically used to perform actual work and identify from 

among those sequences the particular sequence that maxi- 

The invention is iUustratcd in the foUowing drawings, in mizes power consunq)tion. Such a sequence could be dctcr- 

which known circuits arc shown in block-diagram form for mined by profiling Ac power consumption of sequences of 

darity.Tliese drawings and the foUowing textual description 55 operations in a mix of popular software jffograms, and 

arc for explanation and for aiding the reader's choosing from among those sequences the sequence witti the 

understanding, but the invention should not be taken as highest power consumption. 

being limited to the preferred embodiments and design According to the present invention, artificial sequences of 

altemauves lUustrated thercm. ^^^^^^^ keep high-power functional unilTactive for 

FIG. 1(a) shows the blocks of logic drcuitry of the longer than a threshold arc poformed inlow power mode, 

invention. Thus, the invention prevents die IC from consuming power 

FIG. l{b) is a state diagram showing the transitions of a in excess of its specified maximiim regardless of the 

functional unit from its normal mode or state to its reduced- sequence of operations it is performing. This is critical in the 

power mode and back again, according to the invention. case of malicious software, such as a virus, that might 

FIG, 2 shows the blocks of logic circuitry in an cmbodi- 65 deliberately attcnq>t to damage a microprocessor IC or the 

ment of the invention that enfOTces a 50% maximirm sus- system that includes the microprocessor by causing excess 

tainable duty cyde on a floating point functional unit power consiunption. 
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nie invention is independent of the technique of reducing 100.000 clock cycles. A substantial amount of hi^-specd 

ovaaU power consumption by reducing voltage and/or clock coniputation can be performed in a high-power burst of 

rate. It ^be used iTconjunedon with that approach, or in 100.000 clock cycles. Thus, the invenOon aUows bursts of 

lieu of that approach. For ewmple. if an IC would opmtt high activity, unless their duration exceeds the threshold. 

lowered to reduce worst-case power coasuraption; (ii) the 'p^c ioveotioa is flexible in that it encompasses a wide 

invention could be cn^)loyed to reduce worst-case power ttcsigt of methods and devices for monitoring activity levels, 

consumption; or (iii) a combination of both techniques could These design alternatives range from very simple to quite 
be enq>loycd For some power-limited designs, using the lO complex. In fact, each functional unit controlled may have 

invention could make the difference in whether or not a ^ diflfercnt monitcring technique to which it is best suited, 

particular target clock rate can be met ^ particularly simple monitoring technique is to use an 

FIG. is a block diagram of one embodiment of the up/down counter as an activity-level register whose contents 

invention. Ftinctional unit 105 provides ouxent activity indicate the current utilization of the functional unit being 

infonnatioo 108 to activity monitor 106. Cuirent activity nionitcred. In a siII^>lc implementation, the up/down counter 

infonnation 108 describes what tasks or operations func- increments its contents by one during each clock cycle that 

tional unit 105 is currently performing, or indicates that ft is functional unit is active and decrements its contents by 

cunently idle. Based on this current activity inifarmation j-y^ic the functional unit is inactive. A 

108, activity nionitor 106 generates activity level 109, and slightly more complex design alternative is to increment and 

provides it to mode controller 107. Activity level 109 could 20 j^^j^^^ cydc. but rather once per each 

be a number, a set of signals each indicatinig that the activity complex operation that the functional unit performs and 

level is within a specified range, or even a single bit Based decrement for each conesponding period that tfjc functional 

on activity level 109, mode contrdlcr 107 generates mode ^j^^ inactive. Another design alternative is for the activity 

control signal 110, which is coupled to functional unit 105. nionitor to increment by a value other then one, to decrement 

Mode controller 107 switches functional unit 105 b^een by a value other than one, or both, 

a nonnal mode of operation 101 (typically one with high if vjjue by which the contents of the activity-levd 

performance and high power consumption), and a reduced- register is increased during each active cycle equals the 

power mode 102 (typically one lower in performance and y^^^ by which the activity-level register is decreased during 
lower in power consumption). ^ each inactive cycle, then die activity monitor functions to 

Activity monitor 106 monitors the recent utilization of enforce a ma^cimum sustainable duty cycle of fifty percent 

functional unit 105, via activity level 109. Activity level 109 (50%), In an up-down counter implementation, care must be 

could be a special signal generated by functional unit 105, taken that the contents of die activity-level register never go 

o£ it could sinq)ly be the commands that functional unit 105 below zero, or alternatively that a negative number as the 

receives and responds to. Monitoring the recent utilization value in the activity-level regbter is distinguished from a 

could consist of, for example, computing the average duty roU-over condition in which the value l>ecomes too laigc in 

cycle of the functional unit over die preceding thousand the positive direction. 

cycles. If this activity level exceeds a threshold, dien mode current value of the activity-level register is com- 

controUer 107 places functional unit 105 in reduced-power pared against a threshold value. The threshold value is 
mode. Fttrther, if it is desired to monitor the overall power ^ independent of the maximum sustainable duty cyde. It is set 

consumption of an IC, then its substrate temperature could so as to be large enough so thai short bursts of hi^ activity 

be measured and this value used as the activity level of the can execute at fiill speed. Preferably, the threshold value is 

invention. set by profiling the sequence of qKrations selected as the 

FIG. 1(^) is a state-transition diagram of die operation of realistic worst-case power consumption benchmark. The 
the invention. It shows bow mode controller 107 causes 45 threshold value can be thought of as a deficit limit which the 

functional unit 105 to transition between normal mode 101 functional unit can not exceed without having its speed 

and reduced-power mode 102. When the functional unit is in throttled down. Carrying this analogy further, die aurcnt 

normal mode 101 and the recent utilization is greater than value of the activity-levd register can be thought <^ as its 

die threshold, then transition 103 occurs in which the mode current pow» deficit 

controller places die functional unit in reduced-power mode jq ^ ^ ma^timmn sustainable duty cyde value odier dian fifty 

102. Similarly, when in reduced-power mode 102 and the percent (50%) is desired, then it is necessary to have die 

recent utilization is less than the threshold, then the mode active increment be unequal in magnitude to the inactive 

controller takes transition 104 to restore the functional unit decrement For cxanqile, an increment of two and a decre- 

it controls to normal mode 101. mtoX of one produce a thirty-three (33%) percent maximum 
Preferably, die threshold value used is set based on 55 sustainable duty cycle. The sustainable duty cycle is given 

profiling die realistic worst-case power consumption bench- by Equation 1 : 
mark bdng used in the design of this particular IC The 

du^eshold is preferably set large enough that aU or most Equation i: a>C= ifJ^xh 

bursts of high activity occurring in diis benchmark are (ID+AI) 
shorter than this threshold, and dius can be speedily eo 

executed with little or no tiirottling. SDC represents die sustainable duty cyde, AlrqiJresenU the 

In the case where heat dissipatioa is die primary deter- active increment amount and ID represents the inactive 

minant of how much power can be consumed, die threshold decrement amount. In Equation 1, AI and ID are each 

may be on the order <A a hundred diousand (100,000) positive and represent die absolute value of die increment 
operations. A spike in power consun^on of one miUiseo- 63 and decrement values actually used Preferably the adive 

ond (1 ms) may well be toloablc from a tiiormal point of increment value is positive and die inactive decrement value 

view. If die IC is docked at 100MHz,ttiena 1ms ^ikeis is negative. 
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If the active increment AL value is chosen to be ODC, then instniction-evaluation pipeline anticipating thai a condi- 

thc maTtiTTiiim numbcT of consccutivc cydcs that the fiinc- tional branch instruction will (or will not) be taken. If the 

donal unit can be active is equal to the threshold value. In predictioD as to whether or not the branch is taken is coiiect 

general, the naaumum burst length is given by Equation 2: then a significant paformancc speed-up is achieved. But 

5 sometimes the branch prediction is wrong and as soon as this 

Equitioii 2: MBL = ^ known, then the results of the speculative evaluations are 

^ discarded and the correct instructions are started through the 

instruction-evaluation pipeline. A preferred reduced-power 

MBL represents the number of functional-unit cycles in the mode for a microprocessor that performs speculative 

maximum burst length and TH represents the threshold lo instruction execution may be to reduce or eliminate specu- 

vaiue used to conq>are with die current activity value. lative instruction execution. 

More sophisticated activity monitoring schemes are pos- Another example of speculative operation is cache 

sible within the scope of the invention. For examfde, the type prefetching. Many ICs with on-cfaip cache memories antici- 

crf operation the functi<«ial unit is asked to perform could be instruction or data memory accesses will be 

monitored by an activity monitor that associated a particular 13 sequentiaL To increase performance, they prefetch to the 

activity inaement with each possible type of operation. In instruction or data cache some number of words adjacent to 

such a scheme, the contents of the activity-level register the cuireotly requested instruction or data address. A pre- 

could simply decrement at a constant rate. fcnrcd reduced-power mode for a cache memory may com- 

_ . ^ ^ £ n A ' iwise disabling some or all of its speculative prefetches. In 

Design Alteratives for Reducing Power ^ ^ reduced-power mode for any functional 

The invention is fiexible in that it encompasses a wide unit may be to reduce or eliininate its q>eculative activities, 
range of design alternatives for reducing tlie power of the 

functional unit that it controls. These design alternatives can Controlling a Floating-Point Arithmetic Unit by an 
range from very simple to quite complex. In fact, each UP/Down Counter 
functional unit controUed may have a different power reduc- 25 ^ ^^^^^ ^ embodiment of the invention that 
tion technique for which it is most suited. enforces a maximum sustainable duty cyde of fifty pwcent 
A simple way to reduce &e power consumed by the (50%) on floating-point unit 206. In its normal operating 
functional unit is to reduce its dock rate. This could be mode, multiplexer 203 passes system dock 201 on to the 
performed by dividing the clock which it normally receives clock ii^ut of floating-point unit 206. In its reduced-power- 
by two, or by suppressing every other dock pulse. In the ^ mode» multiplexer 203 passes die output of divide-by-two 
case where the maximum sustainable duty cycle is fifty drcuit 202 on to the dock input of floating-point unit 206> 
percent, then dividing Ae dock provided to the functional thus cutting both its speed and power consun^jtion in half, 
unit by two when the threshold is exceeded enfaxes tiiis Floating-point unit 206 provides active signal 207 to the 
maximum duty cycle. Allcmativdy, the clock rate could be up/down contrd ii^jut of up/down counter 205. Fot each 
reduced by a factor othw than two, cydeof system dock 201 f<ff which active signal 207 is true, 
Many ICs include cache memory to ke^ an internal, and up/down counter 205 increments its contents by one. For 
thus quickly assessable, copy of data that is available at a each cycle of system clock 201 for which active signal 207 
slower speed in some type of external memory. Cache is false, up/down counter 205 decrements its contents by 
memory is used for performance reasons. Much less delay is one. If a decrement would take its contents below zero, then 
involved in accessing the information from an on-chip cache up/down counter 205 stays at zero, 
than in accessing it from a device external to the IC. ^he sdect input of multiplexo* 203 is driven by most- 
Cache mcnaorics can l>e major consumers of power within significant-bit output 204 from up/down counter 205. This 
an IC. Thus, it may be desirable to place on-chip cache bit provides the feedback to control whether or not the 
memories under the control of the invention. A simple invention places floating point unit 206 into reduced power 
scheme for reducing the power consun^tion of the cache mode. Thus in diis embodiment, die threshold is predetcr- 
memory is to force access to the external memory (even if mined and must be a power of two. Which power of two is 
a copy of the data is present in the on-chip cache), when used is selected by the number of bits in up/down counter 
necessary to reduce power consumption because the cache 205. When the contents of up/down counter 205 is large 
memory's maximum sustainable duty cycle has been ^ enough that its most significant bit is a one, then the 
exceeded, reduced-power mode is entered and multiplexer 203 sdects 
It will be clear to one skilled in the art that an IC may have the ou^Hit of divide by two circuit 202 to dock floating- 
other on-chip functional units whose functions can be per- point unit 206. 

formed at lower speed by off-ch^> circuits. These are can- Qoddng floating-point unit 206 at half the frequency has 

didatcs for the same power reduction technique as used for 55 the effect of enforcing a fifty percent (50%) TnaTiTnum duty 

cache memory — that is, have the off-chip circuit perform the cyde on floating-point unit 206 during the period that it is 

operation when needed to reduce on-chip power consunq)- in reduced-power mode ie., tiie period that most-significant- 

tion. bit 204 is a one. During this period, floating-point unit active 

In rare cases it may be cost effective to indude on an IC signal 207 is true for every other cycle of system dock 201. 

two complete implementations of a particular functional ^ The increment magnimde and decrcmeot magnitude used 

unit — one being high speed and high power and the other in this embodiment of the invention are equal; that is the 

being low speed and low power. In this case, the mode contents of up/down counter 205 are either increased or 

controller of the invention seiccts wtkh is to be used based decreased by one. Therefore, the mflTimiiTn sustainable duty 

on the current utilization of the functional unit and the cyde allowed for floating-point unit 206 is fifty percent 

current value of its threshold parameter. 65 (50%). Therefore, if the sequence of operations being per- 

In the case of a miaoprocessor that performs speculative formed by the IC attempts to sustain a floating-point duty 

instruction execution, instructions are started through the cyde of more than fifty percent (50%) for longer than the 
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buret allowed by the predetermined threshold, dicn the 
floating point unit's pcrfcHinance is throttled down to stay 
witiiio a duty cycle of fifty percent. 

ControUiog a Cache by a Progranunable Activity 
Monitor 

FIG. 3 shows an embodiment of the invention in which 
die power consunq)tion of a cache memory is controlled in 
a programmable manno". The activity monitor in this 
embodiment of the invention functions as follows: Cache 
active signal 316 is provided from the on-chip cache to the 
select input of multiplexer 307. Based on the current state of 
cache active signal 316, either the value in active increment 
register 304 or the value in inactive decrement register 305 
is presented as a first operand input to adder 308. A second 
operand input to adder 308 is provided from the current 
value of activity-level register 309. Adder 308 sums the 
values of these two operand inputs and provides the result as 
the new current value to be stored In activity-level register 
309. 

The mode controller in this embodiment of the invention 
functions as follows: The control signal cache unavailable 
312 is the result of comparator 310 determining that the 
contents of activity-level register 309 are isaga: than the 
contents of ttu-eshold register 306. When it asserts contr<^ 
signal cache unavailable 312, the cache memory is put into 
its reduced-power mode. i.e. accesses to it arc denied. This 
preferaWy forces the processor into an idle or wait state if it 
attend to reference the cache when cache unavailable 312 
is asserted. 

The values in threshold register 306, inactive decrement 
registo* 305, and active increment register 304 can be 
programmable by a variety of medianisms not shown in 
FIG. 3. niese values arc the throttling parameters associated 
with the functional unit being controlled. The values in 
active increment register 304 and inactive decrement regis- 
ter 305 must be of opposite signs — one must be negative and 
one positive. 

As explained above in connection with E<iuation 1, the 
values of active increment register 304 and inactive decre- 
ment register 305 can be selected to enforce a wide range <rf 
mu Timum duty cydes on the on-chip cache. Further, the 
value of threshold register 306 can be programmed to vary 
die maximum duration of bursts of high cache activity. This 
enatdes high perfarmaocc on sequence of operations that 
require bursts of cache accesses — at least for those bursts of 
duration within tolo^le powcr-consunqition limits. 

Design Alternatives for Programmability 

The invention is flexible in that it encompasses a wide 
range of design alternatives for programming cr setting the 
contents of the throttling parameters associated with a 
particular functional unit, ie, of threshold register 306, 
inactive decrement regista 305, and active increment reg- 
ister 304. They could be read-only values programmed to the 
desired value like a read-only memory (ROM) by varying 
one or two of the mask layers used to fabricate the IC They 
could be programmable read-only values programmed like 
a programmable read-only memory (PROM) by a one-time- 
CMily writing process such as blowing a fusible link for each 
bit. These design alternatives allow different versions of (he 
IC with different power consim^)tion and performance 
specifications. 

AUernativcly, prpg^Mumng Jie^,!^^ val- 
ues*oould*bc*unda^ software control^siACT^ind^ 
the platfonn software ot basic inpu^output system (BIOS) at 
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systan boot orpowcr-on self test (POST) time, or dynami- 
cally under control of the operating system, or pcih^ under 
limited dynamic control by the aj^lications software. If 
programmed at systcmboot time, thechoice of values could 

3 reflect the power supply and heat dissipation characteristics 
of toe system in which the IC is used. For example, 
substantially different values could be used for a portable 
device versus a desk-top device; If programmed by appli- 
cations software, the dioice of values could reflect the 

10 software setting itself a power-consumption budget rich in 
floating-point operations but meager in cache accesses, or 
visa versa. Preferably the hardware would enforce con- 
straints on any throttling parameter values set by software so 
as to maintain overall power consumption within 

15 specifications— for example, an increased value for one 
parameto- might force an automatic decrease in another. 

Alternatively, the values could be altered dynamically as 
the IC operates as part of coordinating power consumption 
across multiple functional units as discussed below. In this 

20 context, the current value of activity-level register 309 can 
be considered one of the throttling paramrtcrs that can be 
decreased to give its associated functional unit a one-time 
performance boost or increased to give it a ten^>orary 
restriction in power consumption. 

23 

Coordinating Power Consun:^on Across Mult^le 
Functional Units 

FIG. 4 is a block diagram of an embodiment of the 
invention for a microprocessOT or other IC that includes both 
^ a data cache and an instruction cache, that performs specu- 
lative prefetches into die instruction cache, and that disables 
speculative instruction-cache prefetching based on the 
recent utilization of the data cache. 

An activity-level monitor and mode controller enforce a 
T« «inmm sustainable duty cyde on accesses to a data cache 
(not shown). Blocks and signals 301 to 316 function as do 
their ciBiespondingly numbered countopart in FIG. 3. Thus, 
when the maximum sustainable duty cycle for data cache 
accesses is exceeded for more than die threshold duration, 
^ then the reduced-power mode is entered and the processor 
may be forced to be idle for a cycle or more if it attciiq)ts to 
reference the data cache. 

Additionally, coii]|>arator 402 compares the current value 
^5 in activity-level regista 309 with the value in prefetch 
threshold register 401, Based on the results of this 
comparison, die control signal throttle instruction-cache 
prefetch 403 is generated if die activity level exceeds ttiis 
threshold. This control signal restricts speculative prefetches 
into the instruction cac^e (not shown), ie. it disables all 
prefetches, or at least reduces their rate. 

The premise of this embodiment is that as long as the 
activity level in the data cache is below a threshold, then 
speculative prefetches into the instruction cache should be 
33 performed. They may speed up execution and they arc 
currcntiy affordable in terms of power consumption. But if 
the cache activity level exceeds this threshold, then 
speculative prefetdies into instruction cache should not be 
performed. The power they consume is not currently afford- 
50 able. FuithCT. die paf ormance speed-iq> given up may be 
mftfginfll because speculative prefetches have no benefit 
when the instruction accesses in a particular sequence of 
operations happen to be non-sequential. 

FIG. 5 is a block diagram of an embodiment of the 
63 invention that includes a power coordinator to dynamically 
optimize the power/speed tradeoff allowed by the invention 
across multiple ftinctional units. In general, power ooordi- 
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natOT 503 can read the activity levels associated with one <x #3 exceeds a threshold. The premise here is that floating- 
marc functional units 501 within integrated circuit #1 and point functional unit #3 is never throttled because the 
alter their throttling parameters, ie. their associated active performance penalty paid by doing that is larger than that 
increment inactive decrement, threshold or activity-lcvcl paid by backing off on the m axim u m duty cycles of the 
Y3[^^ 5 caches. In practice this premise is true for some architectures 

Integrated circuit #1 may include a number of additional P^™g some types of applications, and for oth^ archi- 

**«t,^a«.« * jr « " tcctures or types of applications it would be the other way 

functional units, such as Functional Umt #4, whose power ^^^^ me ionitoiiS^ and control schemes of the preset 

consumption is not momtoied-pcrh^s because it is con- .^^^^^^^ ^^^^^^ ^^^^ accommodate a wide range 

stant or relatively small, or U is the preferred fiinctional umt ^^^^ variations 

to throttle. A high-power functional unit with minimal to ^^^^^^^^ ^ example, power coordinate 503 

perfonnance impact might be a good candidate for being the ^^^^^ ^ ^^^^ maxiimim sustainable duty cycle for 

first functional unit to throttle. instruction cache functional unit #1 and the data cadic 

Integrated circuit #1 may include a number of additional functional unit #2 based on whether or not the current 

functional units, like Functional Unit #3, whose power activity level associated witii the other cache exceeds a 

consumption is not controlled — peih^s because tiie power threshold. The premise here is that if one is relatively 

consumed is relatively small, or there Is no cost-effective inactive, then a higher sustained duty cycle can be afforded 

technique for controlling its power consumption, or there is in the other. 

a very substantial performance impzct of reducing its power The validity of these or any other premises atM)ut the best 

consumption that makes it an unlikely candidate for such techniques for optimizing performance in the context of 

control. ^ worst-case power conservation is preferably determined by 

Based on current activity levels of any or all functional profiling the realistic worst-case power benchmark 

uniu being monitored, power coordinator 503 can alter the described above. Based on such profiling, estimates and 

active increments and/or inactive decrements associated perfonnance tradeoffs can be made between the normal 

witii a particular functional unit 501, thus changing that mode and the reduced-power mode of each functional unit 

functional unit's maTimiiTn sustainable duty cyde. ^ being controlled by the invention. Each such premise can be 

Similarly, power coordinator 503 can alter the ttueshold validated by simulating it against the realistic wwst-case 

associated with a particular functional unit 501. This might power benchmark. 

be done to allow longer bursts of high activity in that Also shown in FIG. 5 is system power coordinator 506 

particular functional unit Alternatively, this might be done and integrated circuit #2, whidi show how the dynamic 

in conjunction with changing that unit's active incrcn>ent ^ power/speed tradeoff of the invention can be hierarchically 

value in order to keep the maximum high-power burst length extended to the level of multiple-IC systems. Each inte- 

constant as discussed in connection with Equation 2 above. grated circuit 500 provides power consunq>tion information 

Similarly, power coordinator 503 can alter the current 5^ to system power coordinator 506. System power coor- 

activity level associated with a particular functional unit , dinatOT 506 provides power consumption commands 507 to 

501, thus giving tijat functional unit a one-time performance each integrated circuit 500. The possible interactions 

boost or power restriction, between system powa* coordinator 506 and each power 

As shown in IC #1 in FIG. 5, functional units #1 and #2 coordinatcr 503 within each IC 500 are analogous to the 

have an associated activity monitor & mode controUcx 502. interactions discussed above between power coordinate 

Functional unit #3 has an associated activity monitor 504, 40 ^ monitors/mode controllers 502, 5M and 
but no mode controller. Functional unit #4 has an associated 

mode controller 505, but no activity monitor. Activity moni- example, a personal computer system could conqjrise 

tor & mode controller 502, activity monitor 504 and mode « micraprocessor IC and one or more per^heral controller 

controller 505 are different types of local power controUers, ICs— a display controller IC, a modem communications IC, 

whose operation is cocffdinatcd by power coordinator 503 45 a disk controller IC, etc. "nic system power coOTdinator 

altering their throttling parameters. ^ovld. raise or lower the power that the microirocessor can 

For example, functional unit #1 could be an instruction cuirentiy consume based on the recent utilization of the 

cache andfunctional unit #2 could be a data cache, each with peripheral contioUer ICs. 

an associated activity monitor and power controUcr, simUar T^ie preferred embodiment of the invention and various 

to that shown in FIG. 3. Functional unit #3 could be a 50 alternative embodiments and designs are disclosed herein, 

floating-point arithmetic unit Functional unit #4 could be Nevertheless, various changes in form and detail may be 

the unit that performs instruction cache prefetching, as made while practicing the invention without dq>arting from 

discussed in conjunction with HG. 4. its spirit and scope or from the foUowing claims. 

In this example^ power coordinator 503 could throttic or ^e claim: 

disable instruction cadie prefetching based on whether or 55 ^- A microprocessor with controUed power consun^on, 

Dot Oie total of tiie current activity-level values within each compnsmg: 

activity monitor & mode controller 502 and eadi activity a storage unit configured to store a dynamically alterable 

monitor 504 exceeds a tiueshold. The premise here is that activity threshold; 

speculative instruction fetching is the first activity la he a floating-point unit operable to compute floating point 

throtUed down, because the paformanoe penalty paid by eo arithmetic in a normal mode and in a reduced-power 

doing that is smaller than that of the reduced-power mode of mode; 

the (^cr functional units. an activity monitor, coupled to said storage unit operable 

Additionally in this example, power cocardinatar 503 to monitcH* utilization of said floating-point unit; and 

could raise or lower the Tn «i'»""^ sustainable duty cycle for a mode controller^ coupled to said storage unit and to said 

each of instruction cache functional unit #1 and data cache 65 activity monitor, operable to place said storage unit in 

functional unit #2 based on whether or not the current said reduced-power mode when said utilization is 

activity level associated with floating-point functional unit greater than said activity threshold. 
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2. The microprocessor of cUim 1, further oonqirisiiig: 
a unit, coupled to said floating-point unit and responsive 

to a power-reduction signal from said mode controller, 
to reduce the rate at whidi said floating-point unit is 
clocked when said floating-point unit is in said 
reduced-power mode. 

3. A microprocessor with controlled power consumption, 
conqnising: 

an activity threshold memory configured to store a 
dynamically alterable threshold; lo 

a cache memory <^>crable to stOTc infom^tion also stored 
in an external memory and c^>eraUe in a normal mode 
and in a reduced-power mode; 

an activity monitor, coi^led to said cache memory, oper- 
able to produce an activity level indicative of utilization ^5 
of said cache memory; and 

a mode controller, coupled to said activity threshold 
memory and to said activity monitor, operable to place 
said cache memory in said reduced-power mode when 
said activity level is greater than said activity threshold 20 

4. The microprocessor claim 3, further comprising: 
a unit, coupled to said cache memory and responsive to a 

power-reduction signal from said mode controller, 
operable to access said external memory when said 
cache memory is in said reduced^wer mode. 25 

5. The micrGprocessCT of claim 3, further comprising: 
a unit, coupled to said cache memory and responsive to a 

power-reduction signal from said mode contraller, 
operable to throttle prefetches of said information into 
said cache memory when said cache memory is in said 30 
reduced-power mode. 

6. The micrpproccssOT of claim 3 wherein said stored 
information con^ses instructions, and said microprocessor 
further comprises: 

an instruction-execution unit, coupled to said cache 35 
memory, operable to execute said instructions. 

7. The microprocessor of claim 3 wherein said stored 
informatioQ con:^rise$ data, and said microprocessor further 
comfmses: 

a data-computation unit, coupled to said cache memory, ^ 
operable to petfonn computations on said data. 

8. A microprocessor with controlled power consumption, 
comprising: 

a data-computation unit, operable to perform conqHita- 
tions on data stored in an external memory; 

a data cache, coupled to said data-computation unit and to 
said external memory, operable to store said data; 

an activity monitor, coupled to said data cache» operable 
to indicate (he recent utilization of said data cache; ^ 

an instructioii-execution unit, operable to execute instruc- 
tions from said external mcm<^; 

an instruction cadie. coupled to said Instruction-execution 
unit and to said external memory, operable to stm said 
instructions and operable in a n<Hmal mode and In a 
reduced-power mode; and 

a mode controller, coupled to said instruction cache and to 
said activity monitor, operable to place said instmction 
cache in said reduced-power mode when said recent 
utilization is greater than a threshold. go 

9. A microprocessor with controlled power consumption, 
con^sing: 

an instruction-execution unit, operable to speculatively 

execute instructions; 
an activity monitor, coupled to said instruction-execution 65 

unit, operable to indicate the recent utilization of said 

instructioi>-execution unit; and 
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a mode controller, coupled to said instrtiction-cxccution 
unit and to said activity monitor, operable to throttie 
said speculative instruction execution, and not to 
throttle non-speculative instruction execution, when 
5 said recent utilization is greater than a threshold. 

10. An IC having controlled power consumption, com- 
prising: 

a functional unit operable in a normal mode and in a 

reduced-power mode; 
an activity monitor, coupled to said functional unit, oper- 
able to indicate utilization of said functional unit; and 
a mode coi^llcr, coupled to said functional unit and to 
said activity monitor, operable to place said functional 
unit in said reduced-power mode when said utilization 
is greater than a threshold indicated by a throttling 
parameter associated with said functional unit said 
throttling parameter being dynamically adjustable to 
alter said threshold. 

11. The IC of claim 10, further conoprising: 
a unit, coupled to said functional unit and responsive to 

said mode controller, operable to reduce the rate at 
which said functional unit is clocked when said func- 
tional unit is in said reduced-power mode. 

12. The IC of claim 10 wherein said functional unit 
conqnises a cache mcniory to store information also stored 
in an external memory, and said IC fiirther comprises: 

a unit, coupled to said functional unit and responsive to 
said mode controller, operable to access external 
memory when said functional unit is in said reduced- 
powa* mode. 

13. The IC of claim 10 wherein said functional unit 
conqirises a cadie memory to stwc information from an 
external memc^, and said IC further comprises: 

a unit, coupled to said functional unit and responsive to 
said mode controller, operable to throttie prefetches 
from said external memoty to said cache memory when 
said functional unit is in said reduced-power mode. 

14. The IC of daim 10, wherein said functional unit is 
operable to perform speculative (^>erations, and said IC 
further comprises: 

a unit, coupled to said functional unit and responsive to 
said mode controller, operable to throttle said specula- 
tive operations when said functional unit is in said 
reduced-power mode. 

15. The IC of daim 10, wherein 
said functional unit is operable in cycles; and 
said activity monitor indicates said utilization by chang- 
ing an activity level by a first amount for each of said 
cydes that said functional unit is active and changing 
said activity levd by a second amount fcff each of said 
cydes that said functional unit is inactive, said throt- 
tling parameter comprising said activity level 

16. The IC of daim 15, further conqxrising: 
a power coordinator operable dynamically to aUa said 

throttling parameter. 

17. An integrated circuit with controlled power 
consumption, comprising: 

a functional means for performing a function operable in 

a nonnal mode and in a reduced-power mode; 
a monitor means, coupled to said functional means, for 
calculating an activity levd indicative of utilization of 
said functional means; and 
a control means, coupled to said functional means and to 
said monitoring means, operable to place said func- 
tional means in said reduced-power mode when said 
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activity level is greater than a predetenniDed threshold, 
the predetermined threshold being dyoamically adjust- 
able. 

18. A method of oontrolling power consumption within an 
integrated circuit (IC). comprising: s 

monitoring utilization of a functional unit within said IC; 

dynamically adjusting an activity threshold associated 
with the functional unit; 

con^aring said utilization with said activity threshold; 

placing said functional unit in a oormal mode when said 
utilization is less than said activity threshold; and 

placing said functional unit in a reduced-power mode 
when said utilization is greater than said activity thresh- 
old. 13 

19. The method of claim 18, wherein said reduced-power 
mode comprises reducing the rate at which said functional 
unit is clocked. 

20. The method of claim 18^ wherein said functional unit 
comprises a cache memory and said step of placing said 20 
cache memory in said reduced-power mode comprises 
diverting accesses to die cache memory to an external 
memory. 

21. The method of daim 18, wherein said functional unit 
performs speculative operations and said step of placing said 25 
functional unit in said reduced-power mode comprises throt- 
tling execution of said speculative q^erations. 

22. The method of daim 18, wherein said monitoring 
comprises changing an activity level by a first amount for 
each cyde in which said functional unit is active and 30 
changing said activity level by a second amount for each 
cyde in which said factional unit is Inactive. 

23. The method of claim 22. further con^sing: 
dynamically detennining said first amount 

24. The method of claim 22. further con^sing: 
dynamically determining said second amount 

25. An integrated circuit (IC) having controlled power 
consumption, con^rising: 

a first functional unit within said IC; ^ 
a second functional unit within said IC« said second 
functional unit being operable in a normal mode and in 
a reduced-power mode and having an activity Icvd 
related to an activity level of the first functional unit; 
an activity momtor within said IC» coupled to said first 45 
functional unit operable to produce an activity levd 
indicative of the activity level of said first functional 
unit; and 

a controller within said IC. coiq)led to said second func- 
tional unit and to said activity monitor, operable to ^ 
place said second functional unit in said reduced-power 
mode when said activity level of said first functional 
unit is greater than a threshold. 

26. An integrated circuit (IC) having controlled power 
oonsunq>tion. oon^sing: 

a plurality of functional units each operable in a normal 

mode aod in a reduced-power mode; 
a plurality of local power controllers, each associated with 

at least one of said functional units and each having 
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throttling parameters, operable to control the power 
consunqjtion of said associated functional unit in acccr- 
dance with the current values of said throttling param- 
eters; and 

a power coordinator, coupled to at least two of said local 
power controllers, operable to read a throttling param- 
eter in a first one of said coupled local power control- 
lers and, based thereon, operable to alter a throttUng 
parameter in a second one of said coupled local power 
controllers. 

27. A method of controlling the power coasun^)tion of an 
integrated dradt (IC), comprising: 

monitoring utilization of a first functional unit within said 
IC to produce a first activity levd; and 

controlling the mode of operation of a second functional 
unit within said IC, said second functional unit having 
a second activity levd related to said first activity levd 
of said first functional unit And being operable in a 
nomial mode and in a reduced-power mode by placing 
said second functional unit in said reduced-power mode 
when said first activity levd is greater than a threshold. 

28. A method of controlling power consumption of an 
integrated circuit (IC), com{msing: 

controlling the mode of operation of a plurality of func- 
tional units within said IC based on a set of throttling 
parameters^ each said functional units being operable in 
a normal mode and in a reduced-power mode; and 

coordinating the power ooasun^tion of said plurality of 
functional units by dynamically monitoring and alter- 
ing said throttling parameters. 

29. An integrated drcuit (IC) with power consumption 
controllable by a processor, comprising: 

a functional unit operable in a normal mode and in a 
reduced-power mode; 

an activity monitor, coupled to said functional unit hav- 
ing a throttling parameter, operable to generate an 
activity level indicative of utilization of said functional 
unit; 

a memory unit configured to store a throttling parameter, 
said throttling parameter bdng dynamically alterable 
by said processor, and 

a mode controller, coupled to said functional unit and to 
said activity monitor, c^>erable to control said mode of 
said functional unit responsive to said activity level and 
to said throttling param^er. 

30. The IC of daim 29, wherein 

said command occurs responsive to platform software 
executing on said processor. 

31. The IC of daim 29, wherein 

said conunand occurs responsive to <^>erating system 
software executing on said processor. 

32. The IC of daim 29. wherein 

said command occurs responsive to applications software 
executing on said processor. 

* « * * 1ft 
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